Overview

Dataset statistics

Number of variables16
Number of observations29531
Missing cells88488
Missing cells (%)18.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.3 MiB
Average record size in memory366.2 B

Variable types

NUM13
CAT3

Reproduction

Analysis started2020-11-25 16:00:18.345442
Analysis finished2020-11-25 16:01:05.103102
Duration46.76 seconds
Software versionpandas-profiling v2.9.0rc1
Download configurationconfig.yaml

Warnings

Date has a high cardinality: 2009 distinct values High cardinality
PM2.5 has 4598 (15.6%) missing values Missing
PM10 has 11140 (37.7%) missing values Missing
NO has 3582 (12.1%) missing values Missing
NO2 has 3585 (12.1%) missing values Missing
NOx has 4185 (14.2%) missing values Missing
NH3 has 10328 (35.0%) missing values Missing
CO has 2059 (7.0%) missing values Missing
SO2 has 3854 (13.1%) missing values Missing
O3 has 4022 (13.6%) missing values Missing
Benzene has 5623 (19.0%) missing values Missing
Toluene has 8041 (27.2%) missing values Missing
Xylene has 18109 (61.3%) missing values Missing
AQI has 4681 (15.9%) missing values Missing
AQI_Bucket has 4681 (15.9%) missing values Missing
Benzene is highly skewed (γ1 = 21.3042) Skewed
NOx has 740 (2.5%) zeros Zeros
CO has 2328 (7.9%) zeros Zeros
Benzene has 3802 (12.9%) zeros Zeros
Toluene has 2861 (9.7%) zeros Zeros
Xylene has 1747 (5.9%) zeros Zeros

Variables

City
Categorical

Distinct count26
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size230.8 KiB
Ahmedabad
2009 
Chennai
2009 
Bengaluru
2009 
Delhi
2009 
Mumbai
2009 
Other values (21)
19486 
ValueCountFrequency (%) 
Ahmedabad20096.8%
 
Chennai20096.8%
 
Bengaluru20096.8%
 
Delhi20096.8%
 
Mumbai20096.8%
 
Lucknow20096.8%
 
Hyderabad20066.8%
 
Patna18586.3%
 
Gurugram16795.7%
 
Visakhapatnam14625.0%
 
Amritsar12214.1%
 
Jorapokhar11694.0%
 
Jaipur11143.8%
 
Thiruvananthapuram11123.8%
 
Amaravati9513.2%
 
Brajrajnagar9383.2%
 
Talcher9253.1%
 
Kolkata8142.8%
 
Guwahati5021.7%
 
Coimbatore3861.3%
 
Shillong3101.0%
 
Chandigarh3041.0%
 
Bhopal2891.0%
 
Ernakulam1620.5%
 
Kochi1620.5%
 

Length

Max length18
Median length8
Mean length8.27574
Min length5

Overview of Unicode Properties

Unique unicode characters38
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a4630318.9%
 
r210338.6%
 
u153966.3%
 
n152946.3%
 
h136785.6%
 
i136645.6%
 
e113534.6%
 
m109914.5%
 
d83343.4%
 
t83063.4%
 
l69412.8%
 
o66942.7%
 
b64102.6%
 
k56162.3%
 
g52402.1%
 
p51462.1%
 
A42941.8%
 
B32361.3%
 
c30961.3%
 
C26991.1%
 
s26831.1%
 
w26241.1%
 
J22830.9%
 
G21810.9%
 
v20630.8%
 
Other values (13)188337.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter21486087.9%
 
Uppercase Letter2953112.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A429414.5%
 
B323611.0%
 
C26999.1%
 
J22837.7%
 
G21817.4%
 
T20376.9%
 
D20096.8%
 
L20096.8%
 
M20096.8%
 
H20066.8%
 
P18586.3%
 
V14625.0%
 
K9763.3%
 
S3101.0%
 
E1620.5%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a4630321.6%
 
r210339.8%
 
u153967.2%
 
n152947.1%
 
h136786.4%
 
i136646.4%
 
e113535.3%
 
m109915.1%
 
d83343.9%
 
t83063.9%
 
l69413.2%
 
o66943.1%
 
b64103.0%
 
k56162.6%
 
g52402.4%
 
p51462.4%
 
c30961.4%
 
s26831.2%
 
w26241.2%
 
v20631.0%
 
y20060.9%
 
j18760.9%
 
z1130.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin244391100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a4630318.9%
 
r210338.6%
 
u153966.3%
 
n152946.3%
 
h136785.6%
 
i136645.6%
 
e113534.6%
 
m109914.5%
 
d83343.4%
 
t83063.4%
 
l69412.8%
 
o66942.7%
 
b64102.6%
 
k56162.3%
 
g52402.1%
 
p51462.1%
 
A42941.8%
 
B32361.3%
 
c30961.3%
 
C26991.1%
 
s26831.1%
 
w26241.1%
 
J22830.9%
 
G21810.9%
 
v20630.8%
 
Other values (13)188337.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII244391100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a4630318.9%
 
r210338.6%
 
u153966.3%
 
n152946.3%
 
h136785.6%
 
i136645.6%
 
e113534.6%
 
m109914.5%
 
d83343.4%
 
t83063.4%
 
l69412.8%
 
o66942.7%
 
b64102.6%
 
k56162.3%
 
g52402.1%
 
p51462.1%
 
A42941.8%
 
B32361.3%
 
c30961.3%
 
C26991.1%
 
s26831.1%
 
w26241.1%
 
J22830.9%
 
G21810.9%
 
v20630.8%
 
Other values (13)188337.7%
 

Date
Categorical

HIGH CARDINALITY

Distinct count2009
Unique (%)6.8%
Missing0
Missing (%)0.0%
Memory size230.8 KiB
2020-04-26
 
26
2020-06-22
 
26
2020-03-31
 
26
2020-05-11
 
26
2020-06-01
 
26
Other values (2004)
29401 
ValueCountFrequency (%) 
2020-04-26260.1%
 
2020-06-22260.1%
 
2020-03-31260.1%
 
2020-05-11260.1%
 
2020-06-01260.1%
 
2020-05-17260.1%
 
2020-06-09260.1%
 
2020-05-05260.1%
 
2020-05-01260.1%
 
2020-04-19260.1%
 
2020-04-21260.1%
 
2020-06-03260.1%
 
2020-06-17260.1%
 
2020-05-07260.1%
 
2020-05-02260.1%
 
2020-05-08260.1%
 
2020-05-28260.1%
 
2020-06-20260.1%
 
2020-03-30260.1%
 
2020-05-16260.1%
 
2020-06-14260.1%
 
2020-04-20260.1%
 
2020-04-06260.1%
 
2020-05-24260.1%
 
2020-03-23260.1%
 
Other values (1984)2888197.8%
 

Length

Max length10
Median length10
Mean length10
Min length10

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
07066623.9%
 
-5906220.0%
 
25160717.5%
 
14970016.8%
 
9124794.2%
 
8115603.9%
 
797993.3%
 
691983.1%
 
585292.9%
 
371012.4%
 
456091.9%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number23624880.0%
 
Dash Punctuation5906220.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
07066629.9%
 
25160721.8%
 
14970021.0%
 
9124795.3%
 
8115604.9%
 
797994.1%
 
691983.9%
 
585293.6%
 
371013.0%
 
456092.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-59062100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common295310100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
07066623.9%
 
-5906220.0%
 
25160717.5%
 
14970016.8%
 
9124794.2%
 
8115603.9%
 
797993.3%
 
691983.1%
 
585292.9%
 
371012.4%
 
456091.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII295310100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
07066623.9%
 
-5906220.0%
 
25160717.5%
 
14970016.8%
 
9124794.2%
 
8115603.9%
 
797993.3%
 
691983.1%
 
585292.9%
 
371012.4%
 
456091.9%
 

PM2.5
Real number (ℝ≥0)

MISSING

Distinct count11716
Unique (%)47.0%
Missing4598
Missing (%)15.6%
Infinite0
Infinite (%)0.0%
Mean67.4506
Minimum0.04
Maximum949.99
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.04
5-th percentile13.206
Q128.82
median48.57
Q380.59
95-th percentile193.96
Maximum949.99
Range949.95
Interquartile range (IQR)51.77

Descriptive statistics

Standard deviation64.6614
Coefficient of variation (CV)0.958649
Kurtosis21.1322
Mean67.4506
Median Absolute Deviation (MAD)23.43
Skewness3.36996
Sum1.68175e+06
Variance4181.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11190.1%
 
20.7512< 0.1%
 
27.8211< 0.1%
 
1510< 0.1%
 
18.8110< 0.1%
 
47.4310< 0.1%
 
11.8110< 0.1%
 
28.4510< 0.1%
 
29.7510< 0.1%
 
269< 0.1%
 
18.369< 0.1%
 
31.089< 0.1%
 
37.939< 0.1%
 
32.569< 0.1%
 
32.339< 0.1%
 
38.079< 0.1%
 
10.319< 0.1%
 
33.99< 0.1%
 
48.129< 0.1%
 
19.49< 0.1%
 
32.089< 0.1%
 
40.969< 0.1%
 
39.429< 0.1%
 
47.349< 0.1%
 
29.89< 0.1%
 
Other values (11691)2468783.6%
 
(Missing)459815.6%
 
ValueCountFrequency (%) 
0.041< 0.1%
 
0.161< 0.1%
 
0.241< 0.1%
 
0.281< 0.1%
 
0.981< 0.1%
 
0.991< 0.1%
 
1.141< 0.1%
 
1.191< 0.1%
 
1.251< 0.1%
 
1.391< 0.1%
 
ValueCountFrequency (%) 
949.991< 0.1%
 
917.771< 0.1%
 
916.671< 0.1%
 
914.941< 0.1%
 
914.641< 0.1%
 
894.751< 0.1%
 
868.661< 0.1%
 
858.731< 0.1%
 
832.81< 0.1%
 
821.421< 0.1%
 

PM10
Real number (ℝ≥0)

MISSING

Distinct count12571
Unique (%)68.4%
Missing11140
Missing (%)37.7%
Infinite0
Infinite (%)0.0%
Mean118.127
Minimum0.01
Maximum1000
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.01
5-th percentile26.365
Q156.255
median95.68
Q3149.745
95-th percentile303.34
Maximum1000
Range999.99
Interquartile range (IQR)93.49

Descriptive statistics

Standard deviation90.6051
Coefficient of variation (CV)0.767014
Kurtosis6.74787
Mean118.127
Median Absolute Deviation (MAD)43.92
Skewness2.05319
Sum2.17248e+06
Variance8209.29
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
949< 0.1%
 
33.817< 0.1%
 
109.676< 0.1%
 
43.16< 0.1%
 
102.176< 0.1%
 
72.046< 0.1%
 
87.026< 0.1%
 
39.466< 0.1%
 
84.086< 0.1%
 
20.536< 0.1%
 
112.365< 0.1%
 
35.435< 0.1%
 
60.165< 0.1%
 
98.755< 0.1%
 
92.45< 0.1%
 
63.235< 0.1%
 
66.955< 0.1%
 
57.425< 0.1%
 
61.475< 0.1%
 
86.735< 0.1%
 
35.015< 0.1%
 
63.425< 0.1%
 
62.825< 0.1%
 
44.955< 0.1%
 
56.095< 0.1%
 
Other values (12546)1825261.8%
 
(Missing)1114037.7%
 
ValueCountFrequency (%) 
0.011< 0.1%
 
0.021< 0.1%
 
0.031< 0.1%
 
0.042< 0.1%
 
0.061< 0.1%
 
0.071< 0.1%
 
0.132< 0.1%
 
0.142< 0.1%
 
0.161< 0.1%
 
0.172< 0.1%
 
ValueCountFrequency (%) 
10001< 0.1%
 
9852< 0.1%
 
917.081< 0.1%
 
847.411< 0.1%
 
802.871< 0.1%
 
796.881< 0.1%
 
768.161< 0.1%
 
763.581< 0.1%
 
761.911< 0.1%
 
743.981< 0.1%
 

NO
Real number (ℝ≥0)

MISSING

Distinct count5776
Unique (%)22.3%
Missing3582
Missing (%)12.1%
Infinite0
Infinite (%)0.0%
Mean17.5747
Minimum0.02
Maximum390.68
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.02
5-th percentile1.7
Q15.63
median9.89
Q319.95
95-th percentile61.19
Maximum390.68
Range390.66
Interquartile range (IQR)14.32

Descriptive statistics

Standard deviation22.7858
Coefficient of variation (CV)1.29651
Kurtosis25.1643
Mean17.5747
Median Absolute Deviation (MAD)5.64
Skewness3.88317
Sum456047
Variance519.195
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5.93340.1%
 
8.78290.1%
 
7.78290.1%
 
0.92280.1%
 
0.97270.1%
 
1.94270.1%
 
0.9260.1%
 
7.97260.1%
 
2.89260.1%
 
5.23250.1%
 
5.95250.1%
 
5.28250.1%
 
0.88250.1%
 
8.91250.1%
 
4.57240.1%
 
7.23240.1%
 
8.89240.1%
 
2.87240.1%
 
8.62240.1%
 
3.63240.1%
 
7.94240.1%
 
7.72240.1%
 
2.81240.1%
 
3.1240.1%
 
2.98230.1%
 
Other values (5751)2530985.7%
 
(Missing)358212.1%
 
ValueCountFrequency (%) 
0.027< 0.1%
 
0.033< 0.1%
 
0.062< 0.1%
 
0.092< 0.1%
 
0.11< 0.1%
 
0.112< 0.1%
 
0.121< 0.1%
 
0.131< 0.1%
 
0.141< 0.1%
 
0.181< 0.1%
 
ValueCountFrequency (%) 
390.681< 0.1%
 
382.441< 0.1%
 
351.31< 0.1%
 
304.261< 0.1%
 
289.751< 0.1%
 
288.551< 0.1%
 
287.141< 0.1%
 
273.391< 0.1%
 
270.091< 0.1%
 
268.031< 0.1%
 

NO2
Real number (ℝ≥0)

MISSING

Distinct count7404
Unique (%)28.5%
Missing3585
Missing (%)12.1%
Infinite0
Infinite (%)0.0%
Mean28.5607
Minimum0.01
Maximum362.21
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.01
5-th percentile4.93
Q111.75
median21.69
Q337.62
95-th percentile74.125
Maximum362.21
Range362.2
Interquartile range (IQR)25.87

Descriptive statistics

Standard deviation24.4747
Coefficient of variation (CV)0.856939
Kurtosis11.2111
Mean28.5607
Median Absolute Deviation (MAD)11.42
Skewness2.46456
Sum741035
Variance599.013
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10.58240.1%
 
9.42230.1%
 
9.14180.1%
 
9.44170.1%
 
10.21170.1%
 
10.09170.1%
 
9.47170.1%
 
7.14170.1%
 
9.24170.1%
 
11.26160.1%
 
10.1160.1%
 
9.41160.1%
 
10.76160.1%
 
10.15160.1%
 
10.06160.1%
 
11.56160.1%
 
13.9160.1%
 
10.99160.1%
 
7.21160.1%
 
10.65160.1%
 
9.91160.1%
 
17.95160.1%
 
10.8160.1%
 
10.14150.1%
 
14.19150.1%
 
Other values (7379)2552586.4%
 
(Missing)358512.1%
 
ValueCountFrequency (%) 
0.012< 0.1%
 
0.025< 0.1%
 
0.039< 0.1%
 
0.042< 0.1%
 
0.053< 0.1%
 
0.063< 0.1%
 
0.077< 0.1%
 
0.085< 0.1%
 
0.097< 0.1%
 
0.14< 0.1%
 
ValueCountFrequency (%) 
362.211< 0.1%
 
292.021< 0.1%
 
277.311< 0.1%
 
273.391< 0.1%
 
266.461< 0.1%
 
245.621< 0.1%
 
241.341< 0.1%
 
239.181< 0.1%
 
239.11< 0.1%
 
237.271< 0.1%
 

NOx
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count8156
Unique (%)32.2%
Missing4185
Missing (%)14.2%
Infinite0
Infinite (%)0.0%
Mean32.3091
Minimum0
Maximum467.63
Zeros740
Zeros (%)2.5%
Memory size230.8 KiB

Quantile statistics

Minimum0
5-th percentile2.4
Q112.82
median23.52
Q340.1275
95-th percentile96.3575
Maximum467.63
Range467.63
Interquartile range (IQR)27.3075

Descriptive statistics

Standard deviation31.646
Coefficient of variation (CV)0.979476
Kurtosis10.8363
Mean32.3091
Median Absolute Deviation (MAD)12.69
Skewness2.56991
Sum818907
Variance1001.47
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
07402.5%
 
4.222080.7%
 
6.241150.4%
 
4.3350.1%
 
2.21310.1%
 
4.95190.1%
 
4.14180.1%
 
4.47170.1%
 
4.97160.1%
 
16.7814< 0.1%
 
4.0514< 0.1%
 
12.714< 0.1%
 
13.2513< 0.1%
 
4.3913< 0.1%
 
17.8213< 0.1%
 
19.2513< 0.1%
 
15.1313< 0.1%
 
5.2813< 0.1%
 
6.213< 0.1%
 
20.9313< 0.1%
 
20.4813< 0.1%
 
4.8113< 0.1%
 
22.2413< 0.1%
 
15.0913< 0.1%
 
22.4613< 0.1%
 
Other values (8131)2393681.1%
 
(Missing)418514.2%
 
ValueCountFrequency (%) 
07402.5%
 
0.034< 0.1%
 
0.049< 0.1%
 
0.053< 0.1%
 
0.062< 0.1%
 
0.072< 0.1%
 
0.091< 0.1%
 
0.13< 0.1%
 
0.112< 0.1%
 
0.121< 0.1%
 
ValueCountFrequency (%) 
467.631< 0.1%
 
382.841< 0.1%
 
378.311< 0.1%
 
378.241< 0.1%
 
302.781< 0.1%
 
293.11< 0.1%
 
289.091< 0.1%
 
287.891< 0.1%
 
273.331< 0.1%
 
271.941< 0.1%
 

NH3
Real number (ℝ≥0)

MISSING

Distinct count5922
Unique (%)30.8%
Missing10328
Missing (%)35.0%
Infinite0
Infinite (%)0.0%
Mean23.4835
Minimum0.01
Maximum352.89
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.01
5-th percentile2.74
Q18.58
median15.85
Q330.02
95-th percentile63.427
Maximum352.89
Range352.88
Interquartile range (IQR)21.44

Descriptive statistics

Standard deviation25.6843
Coefficient of variation (CV)1.09372
Kurtosis27.9646
Mean23.4835
Median Absolute Deviation (MAD)9.25
Skewness4.08399
Sum450953
Variance659.682
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6.29360.1%
 
6.32290.1%
 
6.31280.1%
 
6.3280.1%
 
6.28270.1%
 
6.27240.1%
 
10.46230.1%
 
6.59220.1%
 
3.66210.1%
 
6.33210.1%
 
6.6210.1%
 
6.62210.1%
 
6.57210.1%
 
6.63210.1%
 
10.42200.1%
 
6.34200.1%
 
6.61190.1%
 
6.25190.1%
 
6.71180.1%
 
6.64180.1%
 
3.65180.1%
 
6.58170.1%
 
11.99170.1%
 
11.7160.1%
 
6.67160.1%
 
Other values (5897)1866263.2%
 
(Missing)1032835.0%
 
ValueCountFrequency (%) 
0.012< 0.1%
 
0.026< 0.1%
 
0.041< 0.1%
 
0.051< 0.1%
 
0.061< 0.1%
 
0.082< 0.1%
 
0.11< 0.1%
 
0.114< 0.1%
 
0.123< 0.1%
 
0.132< 0.1%
 
ValueCountFrequency (%) 
352.891< 0.1%
 
328.891< 0.1%
 
323.481< 0.1%
 
309.041< 0.1%
 
303.531< 0.1%
 
302.081< 0.1%
 
301.281< 0.1%
 
301.181< 0.1%
 
297.641< 0.1%
 
296.431< 0.1%
 

CO
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count1779
Unique (%)6.5%
Missing2059
Missing (%)7.0%
Infinite0
Infinite (%)0.0%
Mean2.2486
Minimum0
Maximum175.81
Zeros2328
Zeros (%)7.9%
Memory size230.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.51
median0.89
Q31.45
95-th percentile8.0245
Maximum175.81
Range175.81
Interquartile range (IQR)0.94

Descriptive statistics

Standard deviation6.96288
Coefficient of variation (CV)3.09654
Kurtosis109.488
Mean2.2486
Median Absolute Deviation (MAD)0.44
Skewness8.87832
Sum61773.5
Variance48.4818
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
023287.9%
 
0.682090.7%
 
0.852080.7%
 
0.82050.7%
 
0.892030.7%
 
0.782000.7%
 
0.842000.7%
 
0.811990.7%
 
0.641980.7%
 
0.671940.7%
 
0.821930.7%
 
0.831930.7%
 
0.861930.7%
 
0.791930.7%
 
0.881910.6%
 
0.721890.6%
 
0.871860.6%
 
0.771830.6%
 
0.761820.6%
 
0.611800.6%
 
0.951800.6%
 
0.711800.6%
 
0.931770.6%
 
0.751760.6%
 
0.571760.6%
 
Other values (1754)2055669.6%
 
(Missing)20597.0%
 
ValueCountFrequency (%) 
023287.9%
 
0.01590.2%
 
0.02590.2%
 
0.03560.2%
 
0.04300.1%
 
0.05480.2%
 
0.06420.1%
 
0.07400.1%
 
0.08340.1%
 
0.09380.1%
 
ValueCountFrequency (%) 
175.811< 0.1%
 
145.321< 0.1%
 
134.851< 0.1%
 
132.471< 0.1%
 
132.071< 0.1%
 
124.011< 0.1%
 
119.681< 0.1%
 
119.31< 0.1%
 
118.021< 0.1%
 
1181< 0.1%
 

SO2
Real number (ℝ≥0)

MISSING

Distinct count4761
Unique (%)18.5%
Missing3854
Missing (%)13.1%
Infinite0
Infinite (%)0.0%
Mean14.532
Minimum0.01
Maximum193.86
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.01
5-th percentile2.63
Q15.67
median9.16
Q315.22
95-th percentile46.208
Maximum193.86
Range193.85
Interquartile range (IQR)9.55

Descriptive statistics

Standard deviation18.1338
Coefficient of variation (CV)1.24785
Kurtosis22.0671
Mean14.532
Median Absolute Deviation (MAD)4.12
Skewness4.08366
Sum373138
Variance328.834
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5.74360.1%
 
6.12350.1%
 
4.65320.1%
 
5.81320.1%
 
6.61320.1%
 
5.53320.1%
 
5.57310.1%
 
5.95310.1%
 
6.47310.1%
 
5.13300.1%
 
5.87290.1%
 
5.8290.1%
 
5.59290.1%
 
6.88290.1%
 
6290.1%
 
6.64280.1%
 
5.58280.1%
 
5.66280.1%
 
5.43280.1%
 
5.86280.1%
 
8.71280.1%
 
5.45280.1%
 
6.42280.1%
 
6.44280.1%
 
4.93280.1%
 
Other values (4736)2493084.4%
 
(Missing)385413.1%
 
ValueCountFrequency (%) 
0.011< 0.1%
 
0.041< 0.1%
 
0.211< 0.1%
 
0.261< 0.1%
 
0.361< 0.1%
 
0.412< 0.1%
 
0.421< 0.1%
 
0.441< 0.1%
 
0.481< 0.1%
 
0.491< 0.1%
 
ValueCountFrequency (%) 
193.861< 0.1%
 
187.021< 0.1%
 
186.081< 0.1%
 
182.391< 0.1%
 
180.851< 0.1%
 
179.181< 0.1%
 
178.931< 0.1%
 
178.631< 0.1%
 
178.581< 0.1%
 
176.881< 0.1%
 

O3
Real number (ℝ≥0)

MISSING

Distinct count7699
Unique (%)30.2%
Missing4022
Missing (%)13.6%
Infinite0
Infinite (%)0.0%
Mean34.4914
Minimum0.01
Maximum257.73
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum0.01
5-th percentile7.02
Q118.86
median30.84
Q345.57
95-th percentile74.142
Maximum257.73
Range257.72
Interquartile range (IQR)26.71

Descriptive statistics

Standard deviation21.6949
Coefficient of variation (CV)0.628995
Kurtosis3.42946
Mean34.4914
Median Absolute Deviation (MAD)12.96
Skewness1.33012
Sum879842
Variance470.67
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16.48170.1%
 
23.6150.1%
 
22.14150.1%
 
19.6414< 0.1%
 
18.3314< 0.1%
 
32.0613< 0.1%
 
19.6813< 0.1%
 
22.9413< 0.1%
 
13.1413< 0.1%
 
43.7712< 0.1%
 
25.312< 0.1%
 
18.6612< 0.1%
 
31.9512< 0.1%
 
19.5812< 0.1%
 
23.5812< 0.1%
 
24.9112< 0.1%
 
28.2412< 0.1%
 
7.7112< 0.1%
 
20.7512< 0.1%
 
1912< 0.1%
 
30.8111< 0.1%
 
34.3611< 0.1%
 
29.7611< 0.1%
 
31.2111< 0.1%
 
17.0111< 0.1%
 
Other values (7674)2519585.3%
 
(Missing)402213.6%
 
ValueCountFrequency (%) 
0.014< 0.1%
 
0.027< 0.1%
 
0.032< 0.1%
 
0.043< 0.1%
 
0.052< 0.1%
 
0.063< 0.1%
 
0.071< 0.1%
 
0.18< 0.1%
 
0.112< 0.1%
 
0.121< 0.1%
 
ValueCountFrequency (%) 
257.731< 0.1%
 
200.411< 0.1%
 
193.311< 0.1%
 
186.071< 0.1%
 
177.071< 0.1%
 
175.041< 0.1%
 
172.281< 0.1%
 
169.361< 0.1%
 
169.351< 0.1%
 
165.481< 0.1%
 

Benzene
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct count1873
Unique (%)7.8%
Missing5623
Missing (%)19.0%
Infinite0
Infinite (%)0.0%
Mean3.28084
Minimum0
Maximum455.03
Zeros3802
Zeros (%)12.9%
Memory size230.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.12
median1.07
Q33.08
95-th percentile9.72
Maximum455.03
Range455.03
Interquartile range (IQR)2.96

Descriptive statistics

Standard deviation15.8111
Coefficient of variation (CV)4.81923
Kurtosis530.171
Mean3.28084
Median Absolute Deviation (MAD)1.06
Skewness21.3042
Sum78438.3
Variance249.992
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0380212.9%
 
0.033001.0%
 
0.022921.0%
 
0.012170.7%
 
0.041900.6%
 
0.051760.6%
 
0.091700.6%
 
21700.6%
 
0.11670.6%
 
0.081570.5%
 
0.061460.5%
 
0.111370.5%
 
0.121360.5%
 
0.161310.4%
 
0.131260.4%
 
0.071230.4%
 
0.221190.4%
 
0.21190.4%
 
0.141180.4%
 
0.181150.4%
 
0.151140.4%
 
0.171130.4%
 
0.281100.4%
 
0.191070.4%
 
0.231050.4%
 
Other values (1848)1644855.7%
 
(Missing)562319.0%
 
ValueCountFrequency (%) 
0380212.9%
 
0.012170.7%
 
0.022921.0%
 
0.033001.0%
 
0.041900.6%
 
0.051760.6%
 
0.061460.5%
 
0.071230.4%
 
0.081570.5%
 
0.091700.6%
 
ValueCountFrequency (%) 
455.031< 0.1%
 
454.851< 0.1%
 
449.381< 0.1%
 
448.591< 0.1%
 
445.831< 0.1%
 
443.631< 0.1%
 
438.011< 0.1%
 
435.91< 0.1%
 
435.091< 0.1%
 
432.941< 0.1%
 

Toluene
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count3608
Unique (%)16.8%
Missing8041
Missing (%)27.2%
Infinite0
Infinite (%)0.0%
Mean8.70097
Minimum0
Maximum454.85
Zeros2861
Zeros (%)9.7%
Memory size230.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.6
median2.97
Q39.15
95-th percentile33.92
Maximum454.85
Range454.85
Interquartile range (IQR)8.55

Descriptive statistics

Standard deviation19.9692
Coefficient of variation (CV)2.29505
Kurtosis216.746
Mean8.70097
Median Absolute Deviation (MAD)2.94
Skewness11.6661
Sum186984
Variance398.767
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
028619.7%
 
0.021110.4%
 
0.031020.3%
 
0.05990.3%
 
0.04860.3%
 
1.1830.3%
 
6790.3%
 
0.08760.3%
 
0.06720.2%
 
0.01700.2%
 
0.16640.2%
 
0.07610.2%
 
0.21590.2%
 
6.01560.2%
 
0.09540.2%
 
0.25530.2%
 
0.18530.2%
 
0.22520.2%
 
0.11520.2%
 
1.11490.2%
 
0.13490.2%
 
5.99480.2%
 
0.1480.2%
 
0.15480.2%
 
0.2470.2%
 
Other values (3583)1705857.8%
 
(Missing)804127.2%
 
ValueCountFrequency (%) 
028619.7%
 
0.01700.2%
 
0.021110.4%
 
0.031020.3%
 
0.04860.3%
 
0.05990.3%
 
0.06720.2%
 
0.07610.2%
 
0.08760.3%
 
0.09540.2%
 
ValueCountFrequency (%) 
454.851< 0.1%
 
454.121< 0.1%
 
449.141< 0.1%
 
448.871< 0.1%
 
445.841< 0.1%
 
443.631< 0.1%
 
437.771< 0.1%
 
435.941< 0.1%
 
434.921< 0.1%
 
433.021< 0.1%
 

Xylene
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count1561
Unique (%)13.7%
Missing18109
Missing (%)61.3%
Infinite0
Infinite (%)0.0%
Mean3.07013
Minimum0
Maximum170.37
Zeros1747
Zeros (%)5.9%
Memory size230.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.14
median0.98
Q33.35
95-th percentile12.558
Maximum170.37
Range170.37
Interquartile range (IQR)3.21

Descriptive statistics

Standard deviation6.32325
Coefficient of variation (CV)2.0596
Kurtosis119.98
Mean3.07013
Median Absolute Deviation (MAD)0.98
Skewness7.89152
Sum35067
Variance39.9835
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
017475.9%
 
0.12550.9%
 
21420.5%
 
0.651200.4%
 
0.121080.4%
 
0.11930.3%
 
0.15800.3%
 
0.13800.3%
 
0.16770.3%
 
0.52760.3%
 
0.07720.2%
 
0.01680.2%
 
0.14660.2%
 
0.08620.2%
 
0.09620.2%
 
0.17580.2%
 
0.22570.2%
 
0.06560.2%
 
1.98550.2%
 
0.18540.2%
 
0.03520.2%
 
0.05520.2%
 
0.19510.2%
 
0.02500.2%
 
2.04470.2%
 
Other values (1536)778226.4%
 
(Missing)1810961.3%
 
ValueCountFrequency (%) 
017475.9%
 
0.01680.2%
 
0.02500.2%
 
0.03520.2%
 
0.04420.1%
 
0.05520.2%
 
0.06560.2%
 
0.07720.2%
 
0.08620.2%
 
0.09620.2%
 
ValueCountFrequency (%) 
170.371< 0.1%
 
137.451< 0.1%
 
125.181< 0.1%
 
116.621< 0.1%
 
109.231< 0.1%
 
105.761< 0.1%
 
94.481< 0.1%
 
89.71< 0.1%
 
84.721< 0.1%
 
81.261< 0.1%
 

AQI
Real number (ℝ≥0)

MISSING

Distinct count829
Unique (%)3.3%
Missing4681
Missing (%)15.9%
Infinite0
Infinite (%)0.0%
Mean166.464
Minimum13
Maximum2049
Zeros0
Zeros (%)0.0%
Memory size230.8 KiB

Quantile statistics

Minimum13
5-th percentile50
Q181
median118
Q3208
95-th percentile407
Maximum2049
Range2036
Interquartile range (IQR)127

Descriptive statistics

Standard deviation140.697
Coefficient of variation (CV)0.845209
Kurtosis21.4237
Mean166.464
Median Absolute Deviation (MAD)48
Skewness3.39676
Sum4.13662e+06
Variance19795.5
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1022230.8%
 
1002220.8%
 
1062080.7%
 
702080.7%
 
781980.7%
 
981950.7%
 
1041920.7%
 
661920.7%
 
801900.6%
 
921870.6%
 
741870.6%
 
821850.6%
 
941840.6%
 
1031840.6%
 
901840.6%
 
861830.6%
 
771830.6%
 
1101800.6%
 
1011790.6%
 
681790.6%
 
961780.6%
 
761770.6%
 
1081760.6%
 
651760.6%
 
691750.6%
 
Other values (804)2012568.1%
 
(Missing)468115.9%
 
ValueCountFrequency (%) 
131< 0.1%
 
143< 0.1%
 
153< 0.1%
 
164< 0.1%
 
177< 0.1%
 
182< 0.1%
 
19270.1%
 
20290.1%
 
217< 0.1%
 
228< 0.1%
 
ValueCountFrequency (%) 
20491< 0.1%
 
19171< 0.1%
 
18421< 0.1%
 
17471< 0.1%
 
17191< 0.1%
 
16721< 0.1%
 
16461< 0.1%
 
16301< 0.1%
 
16131< 0.1%
 
15951< 0.1%
 

AQI_Bucket
Categorical

MISSING

Distinct count6
Unique (%)< 0.1%
Missing4681
Missing (%)15.9%
Memory size230.8 KiB
Moderate
8829 
Satisfactory
8224 
Poor
2781 
Very Poor
2337 
Good
1341 
ValueCountFrequency (%) 
Moderate882929.9%
 
Satisfactory822427.8%
 
Poor27819.4%
 
Very Poor23377.9%
 
Good13414.5%
 
Severe13384.5%
 
(Missing)468115.9%
 

Length

Max length12
Median length8
Mean length7.75158
Min length3

Overview of Unicode Properties

Unique unicode characters19
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o2997113.1%
 
a2995813.1%
 
r2584611.3%
 
t2527711.0%
 
e2400910.5%
 
y105614.6%
 
d101704.4%
 
S95624.2%
 
n93624.1%
 
M88293.9%
 
i82243.6%
 
s82243.6%
 
f82243.6%
 
c82243.6%
 
P51182.2%
 
V23371.0%
 
23371.0%
 
G13410.6%
 
v13380.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter19938887.1%
 
Uppercase Letter2718711.9%
 
Space Separator23371.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o2997115.0%
 
a2995815.0%
 
r2584613.0%
 
t2527712.7%
 
e2400912.0%
 
y105615.3%
 
d101705.1%
 
n93624.7%
 
i82244.1%
 
s82244.1%
 
f82244.1%
 
c82244.1%
 
v13380.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S956235.2%
 
M882932.5%
 
P511818.8%
 
V23378.6%
 
G13414.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2337100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin22657599.0%
 
Common23371.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o2997113.2%
 
a2995813.2%
 
r2584611.4%
 
t2527711.2%
 
e2400910.6%
 
y105614.7%
 
d101704.5%
 
S95624.2%
 
n93624.1%
 
M88293.9%
 
i82243.6%
 
s82243.6%
 
f82243.6%
 
c82243.6%
 
P51182.3%
 
V23371.0%
 
G13410.6%
 
v13380.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
2337100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII228912100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o2997113.1%
 
a2995813.1%
 
r2584611.3%
 
t2527711.0%
 
e2400910.5%
 
y105614.6%
 
d101704.4%
 
S95624.2%
 
n93624.1%
 
M88293.9%
 
i82243.6%
 
s82243.6%
 
f82243.6%
 
c82243.6%
 
P51182.2%
 
V23371.0%
 
23371.0%
 
G13410.6%
 
v13380.6%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

CityDatePM2.5PM10NONO2NOxNH3COSO2O3BenzeneTolueneXyleneAQIAQI_Bucket
0Ahmedabad2015-01-01NaNNaN0.9218.2217.15NaN0.9227.64133.360.000.020.00NaNNaN
1Ahmedabad2015-01-02NaNNaN0.9715.6916.46NaN0.9724.5534.063.685.503.77NaNNaN
2Ahmedabad2015-01-03NaNNaN17.4019.3029.70NaN17.4029.0730.706.8016.402.25NaNNaN
3Ahmedabad2015-01-04NaNNaN1.7018.4817.97NaN1.7018.5936.084.4310.141.00NaNNaN
4Ahmedabad2015-01-05NaNNaN22.1021.4237.76NaN22.1039.3339.317.0118.892.78NaNNaN
5Ahmedabad2015-01-06NaNNaN45.4138.4881.50NaN45.4145.7646.515.4210.831.93NaNNaN
6Ahmedabad2015-01-07NaNNaN112.1640.62130.77NaN112.1632.2833.470.000.000.00NaNNaN
7Ahmedabad2015-01-08NaNNaN80.8736.7496.75NaN80.8738.5431.890.000.000.00NaNNaN
8Ahmedabad2015-01-09NaNNaN29.1631.0048.00NaN29.1658.6825.750.000.000.00NaNNaN
9Ahmedabad2015-01-10NaNNaNNaN7.040.00NaNNaN8.294.550.000.000.00NaNNaN

Last rows

CityDatePM2.5PM10NONO2NOxNH3COSO2O3BenzeneTolueneXyleneAQIAQI_Bucket
29521Visakhapatnam2020-06-2233.17108.225.5842.4527.0613.700.7313.6534.853.9910.242.3295.0Satisfactory
29522Visakhapatnam2020-06-2325.4083.382.7634.0919.9213.130.5410.4043.272.8812.031.33100.0Satisfactory
29523Visakhapatnam2020-06-2434.3690.901.2223.3813.1214.450.5610.9235.122.993.151.6086.0Satisfactory
29524Visakhapatnam2020-06-2513.4558.542.3021.6013.0912.270.418.1929.381.285.640.9277.0Satisfactory
29525Visakhapatnam2020-06-267.6332.275.9123.2717.1911.150.466.8719.901.455.371.4547.0Good
29526Visakhapatnam2020-06-2715.0250.947.6825.0619.5412.470.478.5523.302.2412.070.7341.0Good
29527Visakhapatnam2020-06-2824.3874.093.4226.0616.5311.990.5212.7230.140.742.210.3870.0Satisfactory
29528Visakhapatnam2020-06-2922.9165.733.4529.5318.3310.710.488.4230.960.010.010.0068.0Satisfactory
29529Visakhapatnam2020-06-3016.6449.974.0529.2618.8010.030.529.8428.300.000.000.0054.0Satisfactory
29530Visakhapatnam2020-07-0115.0066.000.4026.8514.055.200.592.1017.05NaNNaNNaN50.0Good